Let Sense Bags Do Talking: Cross Lingual Word Semantic Similarity for English and Hindi

نویسندگان

  • Apurva Nagvenkar
  • Jyoti Pawar
  • Pushpak Bhattacharyya
چکیده

Cross Lingual Word Semantic (CLWS) similarity is defined as a task to find the semantic similarity between two words across languages. Semantic similarity has been very popular in computing the similarity between two words in same language. CLWS similarity will prove to be very effective in the area of Cross Lingual Information Retrieval, Machine Translation, Cross Lingual Word Sense Disambiguation, etc. In this paper, we discuss a system that is developed to compute CLWS similarity of words between two languages, where one language is treated as resourceful and other is resource scarce. The system is developed using WordNet. The intuition behind this system is that, two words are semantically similar if their senses are similar to each other. The system is tested for English and Hindi with the accuracy 60.5% precision@1 and 72.91% precision@3.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OWNS: Cross-lingual Word Sense Disambiguation Using Weighted Overlap Counts and Wordnet Based Similarity Measures

We report here our work on English French Cross-lingual Word Sense Disambiguation where the task is to find the best French translation for a target English word depending on the context in which it is used. Our approach relies on identifying the nearest neighbors of the test sentence from the training data using a pairwise similarity measure. The proposed measure finds the affinity between two...

متن کامل

Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian

We explore the use of unsupervised methods in Cross-Lingual Word Sense Disambiguation (CL-WSD) with the application of English to Persian. Our proposed approach targets the languages with scarce resources (low-density) by exploiting word embedding and semantic similarity of the words in context. We evaluate the approach on a recent evaluation benchmark and compare it with the state-of-the-art u...

متن کامل

English-Persian Plagiarism Detection based on a Semantic Approach

Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...

متن کامل

An English-Chinese Cross-lingual Word Semantic Similarity Measure Exploring Attributes and Relations

Word semantic similarity measuring is a fundamental issue to many NLP applications and the globalization has made an urgent request for cross-lingual word similarity measure. This paper proposed a word semantic similarity measure which is able to work in cross-lingual scenarios. Basically, a concept can be defined by a set of attributes. The basic idea of this work is to compute the similarity ...

متن کامل

Cross-Lingual Word Sense Disambiguation

Word Sense Disambiguation using Cross-Lingual approach has been used successfully for languages like Farsi and Hindi. However, a comparable corpus in the form of Wikipedia articles available in English and Hindi has been used for such a task. This motivated us to further the approach and test the results when a parallel corpus is used. In this project, we specifically wanted to observe if the a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015